review: Rerun of PR #1120 (Rascal) on 8xH100 SXM by dexhunter · Pull Request #1177 · openai/parameter-golf

dexhunter · 2026-03-31T10:06:09Z

Summary

Independent rerun of PR #1120 (Rascal, val_bpb 1.1099) on 8xH100 SXM (GCP).

Ran the submitted train_gpt.py from commit 39ed402 with SKIP_GPTQ=1, as specified in PR #1120's README reproduction instructions.

Rerun Result

Metric	Published (seed 300)	Rerun (seed 1337)	Delta
`final_sliding_window_exact val_bpb`	1.10979	1.11350	+0.00371
`final_sliding_window_exact val_loss`	1.87383	1.88010	+0.00627
Steps	6,593	6,881	+288
step_avg	~91 ms	87.2 ms	−3.8 ms

The rerun val_bpb is +0.00371 worse than the published seed 300 result. This gap is approximately 7× typical seed variance (~0.0005 std) and 17× the published 3-seed std (0.00021).

Environment

Hardware: 8× H100 80GB SXM (GCP a3-highgpu-8g)
Driver: 565.57.01
Python: 3.12.13
PyTorch: 2.9.1+cu128
NCCL_NET: Socket (required on GCP)
Command: NCCL_NET=Socket SKIP_GPTQ=1 torchrun --standalone --nproc_per_node=8 train_gpt.py

Observations

The rerun achieves more training steps (6,881 vs 6,593) due to a faster step time (87.2 ms vs ~91 ms), yet the final result is significantly worse.
The submitted train_gpt.py does not contain quantization code. It outputs final_model.pt (raw state dict) and computes final_sliding_window_exact on the unquantized model. The int6+zstd quantization and final_int6_roundtrip metrics visible in the published seed logs appear to be produced by an external runner rather than by train_gpt.py itself.
The reported 3-seed metric (val_bpb 1.1099) corresponds to final_sliding_window_exact, which is measured on the pre-quantization model.

Files

RERUN_NOTES.md — detailed notes
RERUN_seed1337.log — full rerun output log

This rerun is provided for community transparency, following the precedent of PR #1126 (rerun of PR #1089).

Ran the submitted train_gpt.py (commit 39ed402) with SKIP_GPTQ=1 on GCP 8xH100. Result: final_sliding_window_exact val_bpb 1.11350 vs published 1.10979 (seed 300). Gap: +0.00371 BPP — 7x larger than typical seed variance (~0.0005). Note: train_gpt.py contains no quantization code; the published int6+zstd metrics appear to come from an external runner.

… script The 2159-line rascal_master (no quantization) was mistakenly committed to records/ instead of the 2468-line script that produced the submission logs. The correct file includes int6+zstd quantization, GPTQ skeleton, and zstandard compression — matching bytes_code=118521 reported in submission.json and logs. Addresses reproducibility concern raised in PR openai#1177. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

newjordan · 2026-03-31T16:22:05Z

sorry man, my agent had replaced the file in git when i was doing optimizations last night. I re-uploaded the proper file. I got my hands in three tests at any given time and it gets messy in my lab.

Ive been workign on model quality not wind down, so I had chopped the wind down for my testing. It shoudl not have been pushed.

If it will make you feel any better, have an agent scrape my notes and ablations from yesterday and you will have a bunch more data =) i'm working in the open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

review: Rerun of PR #1120 (Rascal) on 8xH100 SXM#1177

review: Rerun of PR #1120 (Rascal) on 8xH100 SXM#1177
dexhunter wants to merge 1 commit intoopenai:mainfrom
dexhunter:rerun/pr1120-rascal-reproduction

dexhunter commented Mar 31, 2026

Uh oh!

newjordan commented Mar 31, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

dexhunter commented Mar 31, 2026

Summary

Rerun Result

Environment

Observations

Files

Uh oh!

newjordan commented Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

newjordan commented Mar 31, 2026 •

edited

Loading